68 research outputs found
Death and Suicide in Universal Artificial Intelligence
Reinforcement learning (RL) is a general paradigm for studying intelligent
behaviour, with applications ranging from artificial intelligence to psychology
and economics. AIXI is a universal solution to the RL problem; it can learn any
computable environment. A technical subtlety of AIXI is that it is defined
using a mixture over semimeasures that need not sum to 1, rather than over
proper probability measures. In this work we argue that the shortfall of a
semimeasure can naturally be interpreted as the agent's estimate of the
probability of its death. We formally define death for generally intelligent
agents like AIXI, and prove a number of related theorems about their behaviour.
Notable discoveries include that agent behaviour can change radically under
positive linear transformations of the reward signal (from suicidal to
dogmatically self-preserving), and that the agent's posterior belief that it
will survive increases over time.Comment: Conference: Artificial General Intelligence (AGI) 2016 13 pages, 2
figure
Free Lunch for Optimisation under the Universal Distribution
Function optimisation is a major challenge in computer science. The No Free
Lunch theorems state that if all functions with the same histogram are assumed
to be equally probable then no algorithm outperforms any other in expectation.
We argue against the uniform assumption and suggest a universal prior exists
for which there is a free lunch, but where no particular class of functions is
favoured over another. We also prove upper and lower bounds on the size of the
free lunch
Towards Safe Artificial General Intelligence
The field of artificial intelligence has recently experienced a
number of breakthroughs thanks to progress in deep learning and
reinforcement learning. Computer algorithms now outperform humans
at Go, Jeopardy, image classification, and lip reading, and are
becoming very competent at driving cars and interpreting natural
language. The rapid development has led many to conjecture that
artificial intelligence with greater-than-human ability on a wide
range of tasks may not be far. This in turn raises concerns
whether we know how to control such systems, in case we were to
successfully build them.
Indeed, if humanity would find itself in conflict with a system
of much greater intelligence than itself, then human society
would likely lose. One way to make sure we avoid such a conflict
is to ensure that any future AI system with potentially
greater-than-human-intelligence has goals that are aligned with
the goals of the rest of humanity. For example, it should not
wish to kill humans or steal their resources.
The main focus of this thesis will therefore be goal alignment,
i.e. how to design artificially intelligent agents with goals
coinciding with the goals of their designers. Focus will mainly
be directed towards variants of reinforcement learning, as
reinforcement learning currently seems to be the most promising
path towards powerful artificial intelligence. We identify and
categorize goal misalignment problems in reinforcement learning
agents as designed today, and give examples of how these agents
may cause catastrophes in the future. We also suggest a number of
reasonably modest modifications that can be used to avoid or
mitigate each identified misalignment problem. Finally, we also
study various choices of decision algorithms, and conditions for
when a powerful reinforcement learning system will permit us to
shut it down.
The central conclusion is that while reinforcement learning
systems as designed today are inherently unsafe to scale to human
levels of intelligence, there are ways to potentially address
many of these issues without straying too far from the currently
so successful reinforcement learning paradigm. Much work remains
in turning the high-level proposals suggested in this thesis into
practical algorithms, however
Count-Based Exploration in Feature Space for Reinforcement Learning
We introduce a new count-based optimistic exploration algorithm for
Reinforcement Learning (RL) that is feasible in environments with
high-dimensional state-action spaces. The success of RL algorithms in these
domains depends crucially on generalisation from limited training experience.
Function approximation techniques enable RL agents to generalise in order to
estimate the value of unvisited states, but at present few methods enable
generalisation regarding uncertainty. This has prevented the combination of
scalable RL algorithms with efficient exploration strategies that drive the
agent to reduce its uncertainty. We present a new method for computing a
generalised state visit-count, which allows the agent to estimate the
uncertainty associated with any state. Our \phi-pseudocount achieves
generalisation by exploiting same feature representation of the state space
that is used for value function approximation. States that have less frequently
observed features are deemed more uncertain. The \phi-Exploration-Bonus
algorithm rewards the agent for exploring in feature space rather than in the
untransformed state space. The method is simpler and less computationally
expensive than some previous proposals, and achieves near state-of-the-art
results on high-dimensional RL benchmarks.Comment: Conference: Twenty-sixth International Joint Conference on Artificial
Intelligence (IJCAI-17), 8 pages, 1 figur
A Game-Theoretic Analysis of the Off-Switch Game
The off-switch game is a game theoretic model of a highly intelligent robot
interacting with a human. In the original paper by Hadfield-Menell et al.
(2016), the analysis is not fully game-theoretic as the human is modelled as an
irrational player, and the robot's best action is only calculated under
unrealistic normality and soft-max assumptions. In this paper, we make the
analysis fully game theoretic, by modelling the human as a rational player with
a random utility function. As a consequence, we are able to easily calculate
the robot's best action for arbitrary belief and irrationality assumptions
How RL Agents Behave When Their Actions Are Modified
Reinforcement learning in complex environments may require supervision to
prevent the agent from attempting dangerous actions. As a result of supervisor
intervention, the executed action may differ from the action specified by the
policy. How does this affect learning? We present the Modified-Action Markov
Decision Process, an extension of the MDP model that allows actions to differ
from the policy. We analyze the asymptotic behaviours of common reinforcement
learning algorithms in this setting and show that they adapt in different ways:
some completely ignore modifications while others go to various lengths in
trying to avoid action modifications that decrease reward. By choosing the
right algorithm, developers can prevent their agents from learning to
circumvent interruptions or constraints, and better control agent responses to
other kinds of action modification, like self-damage.Comment: 10 pages (+6 appendix); 7 figures. Published in the AAAI 2021
Conference on AI. Code is available at https://github.com/edlanglois/mamd
- …